7.4 Compression
83
that the probabilities are respectively one half comma one fourth comma one eighth 1
2, 1
4, 1
8, and one eighth 1
8. Then, from Eq. (6.5) we deter-
mineupper I equals 1.75I = 1.75 bits per symbol, so we should be able to encode the message (whose
relative entropy is seven eighths 7
8 and hence redundancy upper RR is one eighth 1
8) such that a smaller channel will
suffice to send it. The following code may be used: 8
↓ A
B
C
D
0
10
110
111
.
The average number of binary digits used in encoding a sequence ofupper NN symbols will
beupper N left parenthesis one half times 1 plus one fourth times 2 plus two eighths times 3 right parenthesis equals seven fourths upper NN( 1
2 × 1 + 1
4 × 2 + 2
8 × 3) = 7
4 N. 0 and 1 can be seen to have equal probabilities;
hence, upper II for the coded sequence is 1 bit/symbol, equivalent to 1.75 binary symbols
per original letter. The binary sequence can be decoded by the transformation
↓ 00
01
10
11
A,
B,
C,
D,
The compression ratio of this process is seven eighths 7
8. Note, however, that there is no general
method for finding the optimal coding.
Problem. Using the above coding, show that the 16-letter message “ABBAAAD-
ABACCDAAB” can be sent using only 14 letters.
The Shannon technique requires a long delay between receiving symbols for
encoding and the actual encoding, in order to accumulate sufficiently accurate indi-
vidual symbol transmission probabilities. The entire message is then encoded. This
is, of course, a highly impractical procedure. Mandelbrot (1952) has devised a proce-
dure whereby messages are encoded word by word. In this case the word delimiters
(e.g., spaces in English text) play a crucial rôle. From Shannon’s viewpoint, such a
code is necessarily redundant, but on the other hand, an error in a single word renders
only that word unintelligible, not the whole message. It also avoids the necessity for
a long delay before coding can begin.
The Mandelbrot coding scheme has interesting statistical properties. One may
presume that the encoder seeks to minimize the cost of conveying a certain amount
of information using the collection of words that are at his disposal. If p Subscript ipi is the
probability of selecting and transmitting theiith word, then the mean information per
symbol contained in the message is, as before, minus sigma summation p Subscript i Baseline log p Subscript i−Σ pi log pi. We may suppose that
the cost of transmitting a selected word is proportional to its length. Ifc Subscript ici is the cost of
transmitting theiith word, then the average cost per word issigma summation p Subscript i Baseline c Subscript iΣ pici. Minimizing the
distribution of the probabilities while keeping the total information constant (using
Lagrange’s method of undetermined multipliers) yields
p Subscript i Baseline equals upper C e Superscript minus upper D c Super Subscript i Superscript Baseline commapi = Ce−Dci ,
(7.4)
8 Elaborated by D. A. Huffman.